输出长度对于对话摘要系统至关重要。对话摘要长度由多个因素决定,包括对话复杂性,摘要目标和个人偏好。在这项工作中,我们从三个角度来对话摘要长度。首先,我们分析了现有模型的输出与相应的人类参考之间的长度差异,并发现摘要模型由于其预训练的目标而倾向于产生更多的详细摘要。其次,我们通过比较不同的模型设置来确定摘要长度预测的显着特征。第三,我们尝试使用长度意识的摘要,并在现有模型上显示出显着改进,如果汇总长度可以很好地整合。分析和实验是在流行的对话和Samsum数据集中进行的,以验证我们的发现。
translated by 谷歌翻译
Word Mover的距离(WMD)计算单词和模型之间的距离与两个文本序列中的单词之间的移动成本相似。但是,它在句子相似性评估中没有提供良好的性能,因为它不包含单词重要性,并且在句子中未能将固有的上下文和结构信息纳入句子。提出了一种使用语法解析树(称为语法感知单词Mover的距离(SYNWMD))的改进的WMD方法,以解决这项工作中的这两个缺点。首先,基于从句子树的句法解析树中提取的一词共发生统计量建立了加权图。每个单词的重要性是从图形连接性推断出的。其次,在计算单词之间的距离时,考虑了单词的局部句法解析结构。为了证明拟议的SynWMD的有效性,我们对6个文本语义相似性(STS)数据集和4个句子分类数据集进行了实验。实验结果表明,SynWMD在STS任务上实现了最先进的性能。它还在句子分类任务上胜过其他基于WMD的方法。
translated by 谷歌翻译
在胸部X射线图像中定位疾病很少仔细注释可以节省大量的人类努力。最近的作品通过创新的弱监督算法(例如多稳定学习(MIL)和类激活图(CAM))处理了这项任务,但是,这些方法通常会产生不准确或不完整的区域。原因之一是忽视了每个图像内部解剖区域的关系中隐藏的病理意义以及跨图像的关系。在本文中,我们认为,作为上下文和补偿信息的跨区域和跨图像关系对于获得更一致和更一致的区域至关重要。为了建模关系,我们提出了图形正则嵌入网络(GREN),该网络(GREN)利用图像和图像间信息来定位胸部X射线图像上的疾病。 Gren使用预先训练的U-NET来分割肺裂片,然后使用图像内图形图对肺裂片之间的内图像进行建模以比较不同的区域。同时,内部图像之间的关系是通过图像间图建模的,以比较多个图像。此过程模仿了放射科医生的训练和决策过程:比较多个区域和图像进行诊断。为了使神经网络的深层嵌入层保留结构信息(在本地化任务中很重要),我们使用哈希编码和锤击距离来计算图形,这些图形用作正规化器来促进训练。通过这种情况,我们的方法实现了NIH胸部X射线数据集的最新结果,以实现弱监督疾病的定位。我们的代码可在线访问(https://github.com/qibaolian/gren)。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
Adversarial robustness assessment for video recognition models has raised concerns owing to their wide applications on safety-critical tasks. Compared with images, videos have much high dimension, which brings huge computational costs when generating adversarial videos. This is especially serious for the query-based black-box attacks where gradient estimation for the threat models is usually utilized, and high dimensions will lead to a large number of queries. To mitigate this issue, we propose to simultaneously eliminate the temporal and spatial redundancy within the video to achieve an effective and efficient gradient estimation on the reduced searching space, and thus query number could decrease. To implement this idea, we design the novel Adversarial spatial-temporal Focus (AstFocus) attack on videos, which performs attacks on the simultaneously focused key frames and key regions from the inter-frames and intra-frames in the video. AstFocus attack is based on the cooperative Multi-Agent Reinforcement Learning (MARL) framework. One agent is responsible for selecting key frames, and another agent is responsible for selecting key regions. These two agents are jointly trained by the common rewards received from the black-box threat models to perform a cooperative prediction. By continuously querying, the reduced searching space composed of key frames and key regions is becoming precise, and the whole query number becomes less than that on the original video. Extensive experiments on four mainstream video recognition models and three widely used action recognition datasets demonstrate that the proposed AstFocus attack outperforms the SOTA methods, which is prevenient in fooling rate, query number, time, and perturbation magnitude at the same.
translated by 谷歌翻译
Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
translated by 谷歌翻译
Rankings are widely collected in various real-life scenarios, leading to the leakage of personal information such as users' preferences on videos or news. To protect rankings, existing works mainly develop privacy protection on a single ranking within a set of ranking or pairwise comparisons of a ranking under the $\epsilon$-differential privacy. This paper proposes a novel notion called $\epsilon$-ranking differential privacy for protecting ranks. We establish the connection between the Mallows model (Mallows, 1957) and the proposed $\epsilon$-ranking differential privacy. This allows us to develop a multistage ranking algorithm to generate synthetic rankings while satisfying the developed $\epsilon$-ranking differential privacy. Theoretical results regarding the utility of synthetic rankings in the downstream tasks, including the inference attack and the personalized ranking tasks, are established. For the inference attack, we quantify how $\epsilon$ affects the estimation of the true ranking based on synthetic rankings. For the personalized ranking task, we consider varying privacy preferences among users and quantify how their privacy preferences affect the consistency in estimating the optimal ranking function. Extensive numerical experiments are carried out to verify the theoretical results and demonstrate the effectiveness of the proposed synthetic ranking algorithm.
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译